Determination of optimum intensity and duration of exercise based on the immune system response using a machine-learning model

One of the important concerns in the field of exercise immunology is determining the appropriate intensity and duration of exercise to prevent suppression of the immune system. Adopting a reliable approach to predict the number of white blood cells (WBCs) during exercise can help to identify the appropriate intensity and duration. Therefore, this study was designed to predict leukocyte levels during exercise with the application of a machine-learning model. We used a random forest (RF) model to predict the number of lymphocytes (LYMPH), neutrophils (NEU), monocytes (MON), eosinophils, basophils, and WBC. Intensity and duration of exercise, WBCs values before exercise training, body mass index (BMI), and maximal aerobic capacity (VO2 max) were used as inputs and WBCs values after exercise training were assessed as outputs of the RF model. In this study, the data was collected from 200 eligible people and K-fold cross-validation was used to train and test the model. Finally, model efficiency was assessed using standard statistics (root mean square error (RMSE), mean absolute error (MAE), relative absolute error (RAE), root relative square error (RRSE), coefficient of determination (R2), and Nash–Sutcliffe efficiency coefficient (NSE)). Our findings revealed that the RF model performed well for predicting the number of WBC with RMSE = 0.94, MAE = 0.76, RAE = 48.54, RRSE = 48.17, NSE = 0.76, and R2 = 0.77. Furthermore, the results showed that intensity and duration of exercise are more effective parameters than BMI and VO2 max to predict the number of LYMPH, NEU, MON, and WBC during exercise. Totally, this study developed a novel approach based on the RF model using the relevant and accessible variables to predict WBCs during exercise. The proposed method can be applied as a promising and cost-effective tool for determining the correct intensity and duration of exercise in healthy people according to the body’s immune system response.

www.nature.com/scientificreports/ this relationship is nonlinear and much complicated 13 . Therefore, until now, researchers have not been able to achieve the optimal pattern of exercise for people, while the discovery of this pattern is vital because exercise with proper intensity and duration can boost the immune system and reduce the chance of contracting diseases such as viral infections, cancer, and inflammatory diseases 7 .
In recent years, machine learning (ML) models have been increasingly noticed as new technology and a powerful tool in information processing, prediction, and modelling [14][15][16][17][18][19] . Decision-tree (DT) algorithms are one of the main tools of ML that have been used in a wide spectrum of applications in clinical fields, including the diagnosis and prediction of cardiovascular diseases and cancers 20 . Random forest (RF) is the most successful general-purpose algorithm in modern times 21 that has shown the highest accuracy among different variants of supervised ML algorithms in most clinical studies 20 . ML algorithms can be divided into three categories according to the way the machine is being taught: supervised, unsupervised and semi-supervised. Supervised ML algorithms are based on response variables that can supervise the analysis 20,22 .
Despite the increased use of intelligent techniques for medical decision support systems 23 , there are very few studies in the area of exercise immunology 24,25 . Furthermore, to the best of our knowledge, there are no studies that have used ML models (e.g., RF) to develop an efficient tool to predict the number of WBCs during exercise. Thus, we provided a novel approach based on the RF model to predict the number of lymphocyte (LYMPH), neutrophil (NEU), monocyte (MON), eosinophil (EOS), basophil (BASO) and WBC during exercise for healthy people. Our proposed method is easily applicable with the least limitations in applying different factors. In this regard, the present study has two main objectives: (1) investigate an RF model to predict the number of WBCs during exercise and (2) investigate the importance of intensity and duration of exercise in the prediction of the number of WBCs during exercise.

Methods
Subjects. This study involved human participants and was approved by the Research Ethics Committee and all methods were performed in accordance with the relevant regulations. The objectives and the research process were clearly explained to all of the subjects, and all participants provided written consent prior to the start of the study. A total of 200 eligible healthy subjects (100 men, 50.0%) in the age range of 18-60 years participated in this study. For knowing of health history (e.g., the presence of infectious, cardiovascular, inflammatory or immune diseases), subjects were screened with questionnaire before the study period. Also, the participants were asked not to take anti-inflammatory agents, steroids and vitamin supplements for 2 weeks before the exercise sessions and refrain from exercise training or vigorous physical activity. The statistical information of 200 individuals is summarised in Table 1.
The protocol. We measured the anthropometric indicators (weight, height and BMI) using standard techniques. To evaluate VO 2 max, the subjects completed a Bruce test to voluntary exhaustion on a calibrated treadmill 26 in the cardiology clinic. Given that changes in the immune system depend on exercise intensity (low, moderate, high) 27 , hence in this study, exercise protocol was planned according to the intensity suggested by the American College of Sports Medicine (ACSM) (i.e., low intensity (50-63% of HR max ), moderate intensity (64-76% of HR max ), and high intensity (77-93% of HR max )) 28 . Before implementing the exercise session, the maximum heart rate (HR max ) using the Tanaka method was computed 29 . Then, the minimum and maximum target heart rate (HR target ) based on the determined intensity for each subject was obtained by the Karvonen method 30 .
The participants performed on a treadmill (Rodby, RL1602E, Sweden) the exercise protocol in accordance with the determined HR target (i.e., between the minimum and maximum HR target ). The heart rate of the subjects during exercise protocol was monitored continuously with a Polar watch and chest strap (Polar Electro Oy, Kempele, Finland) to ensure that the exercise program was performed according to the intensity specified by ACSM. It is noteworthy that subjects were tested in an individual training condition in a public fitness centre, and for each subject only one of the above-mentioned intensities has performed. The duration of exercise training according to the capacity of the subjects was considered, hence a certain duration was not determined for subjects in advance. The individual's capacity is influenced by different factors such as age, gender, BMI, and intensity of exercise 31 . Blood samples (3 ml of peripheral venous blood) were taken at baseline and immediately after the completion of the exercise to determine plasma levels of leukocytes. Finally, the collected data were used for input and output of the RF model to predict the WBCs level.

Random forest (RF). RF as DT-based algorithm is an extremely successful classification and regression
method. This approach, aside from having few parameters to tune, is generally recognized for its accuracy and its ability to deal with small sample sizes 32 . The approach combines several randomized decision trees and produces a forest of decision trees. Every tree predicts a class which the final decision was achieved by averaging all predictions 19 . It is necessary to mention, the data before the modelling process was transformed to range from 0 to 1 because the normalization of data minimizes bias and ensures that they receive the same attention within the network 33 In WBCs modelling, to avoid over-fitting, K-fold cross-validation was applied to train and test the RF model. In this approach, the whole dataset was randomly partitioned into 5 equal sized subsamples (40 cases). Of the five subsamples, four samples for training (160 cases) and one sample for testing (40 cases) were used. this process repeated 5 times, in each time one of the subsamples was used as the validation data 19 .
Model structure and features importance. The use of proper input vectors in supervised ML algorithms is important in the modelling process 34 . In this simple prediction model, the effective factors on WBCs based on past studies, including BMI, VO 2 max, intensity (HR target1 and HR target2 ) and duration of exercise training for input was adopted. We also considered WBCs values before exercise training as a required input because www.nature.com/scientificreports/ the number of WBCs differs between individuals. For the model output, the number of WBCs after exercise training was assessed and finally, 6 different scenarios were established for modelling according to Table 2.
Feature importance due to their simplicity and interpretability of feature ranking is an important and widely used analysis method in modelling with the machine learning algorithms. Most of the supervised ML algorithms including RF provide feature importance 19 . In this study, importance of each parameter based mean decrease in impurity (MDI) was estimated.
Evaluation criteria. Six quantitative metrics, including the Pearson coefficient of determination (R 2 ), root mean squared error (RMSE), mean absolute error (MAE), relative absolute error (RAE), root relative square error (RRSE) 34 , and Nash-Sutcliffe efficiency coefficient (NSE) were used for performance analysis of the model in the testing dataset. It's worth noting that the NSE has been used for the performance evaluation of ML models in different fields (e.g., hydrology, physics) 33,35-37 and has been confirmed as a more reliable efficiency index compared with R 233 . Therefore, we suggested it for evaluation of the results of this study. The equations for the above-mentioned indices are expressed as follows: Table 1. characteristics of participants and input and output data. BMI = body mass index. VO 2 max = maximal aerobic capacity. HR target1 = the minimum of target heart rate of subjects in determined intensity. HR target2 = the maximum target heart rate of subjects in determined intensity. Duration = exercise training duration. WBC 1

Results
Result of scenarios analysis. The RF model to predict the number of WBCs was evaluated using performance indices (RMSE, MAE, RAE, RRSE, NSE, and R 2 ). Their values for all scenarios during the testing phase are shown in Table 3.

Result of feature importance analysis.
We also estimated the features importance in all the scenarios.
The results of the features importance score are indicated in Table 4 and graphically in Fig. 1. Also, to assess the efficiency of the best scenario of the developed model, correlations between actual and predicted values of WBC, NEU, LYMPH, MON, BASO, and EOS during the testing phase were presented in (Fig. 2). Comparisons amongst all tested models showed that the model for predicting BASO (R 2 = 0.11) had the lowest correlation and the model for predicting WBC (R 2 = 0.77) had the best correlation and predicted WBC were in closer agreement with the actual WBC values compared with NEU, LYMPH, MON, BASO, and EOS.
Moreover, the plot of variations of actual values versus predicted values for the best scenario (i.e., WBC) during the testing phase was shown in Fig. 3.

Discussion
Evaluation of the results. Based on the obtained results, for predicting the number of WBC, LYMPH, NEU, and MON, the most effective feature was values of WBC, LYMPH, NEU, and MON before training followed by intensity and duration of exercise; for predicting the number of EOS, the most effective feature was Table 3. Performance of the RF model for prediction of WBCs levels. R 2 = Pearson coefficient of determination. RMSE = Root mean squared error. MAE = Mean absolute error. NSE = Nash-Sutcliffe efficiency coefficient. RAE = relative absolute error. RRSE = root relative square error. www.nature.com/scientificreports/ values of EOS before training followed by VO 2 max, and BMI; and for predicting the number of BASO, no feature was not effective. These results are consistent with the physiological function of the body. Adjustment of the immune response using the central nervous system is performed by bidirectional signals between the nervous, endocrine and immune systems 38 . Two important pathways for immune system dysregulation are: The hypothalamic-pituitary-adrenal axis and the autonomic nervous system. Exercise can activate the hypothalamicpituitary-adrenal axis and the sympathetic nervous system which stimulates the secretion of the hormones such as catecholamines (adrenaline and noradrenaline), adrenocorticotropic hormone, and cortisol. Each of these hormones can cause quantitative and qualitative changes in immune function 39 . For example, an increase in adrenaline concentration and a lesser degree of noradrenaline are the main factors of LYMPH dynamics in acute exercise 40 . Also, some studies showed that cortisol, primarily by the demargination of cells from the blood vessel walls, with a minor contribution from the bone marrow, cause neutrophilia 41 . Most researchers in the field of exercise immunology believe that the immune system reflects the magnitude of physiological stress experienced by the exerciser 42 . Exercise-induced muscle tissue injury and inflammation elicit a strong immune response involving NEU, EOS, BASO, MON, and macrophages. Immune-specific proteins (e.g., oxylipins) are produced to modulate the innate immune response, involved in initiating, mediating, and resolving this process 43,44 . The majority of the expressed immune-related proteins (e.g., lysozyme C, neutrophil elastase and defensing1, cathelicidin antimicrobial peptide, α-actinin-1, and profilin-1) are involved with pathogen defense and immune cell chemotaxis and locomotion. Other proteins (e.g., serum amyloid A-4, myeloperoxidase, plasma protease C1 inhibitor, α-2-HS-glycoprotein, andα-1-acid glycoprotein 2) increase during recovery and affect the inflammatory acute phase response 43 . This profound, exercise-induced perturbation in metabolites, lipid mediators, and proteins likely has a direct influence on immune function and results in transient immune dysfunction 45 . Low effectiveness of intensity and duration of exercise in the prediction of the number of EOS may be because of more effects of EOS in allergic diseases and parasitic infections 46 . Moreover, it may show that these cells need more severe stress than the stress induced in this study 47 . Also, the high impact of intensity and duration of exercise on the prediction of WBC levels considering the effect of exercise on NEU, LYMPH, and MON and a large volume of them in leukocytes (NEU (about 60%), lymphocytes (about 30%), and MON (about 5.3%) 48 ), can be justifiable.

Scenario number RMSE (10 3 /mm 3 ) MAE (10 3 /mm 3 ) RAE (%) RRSE (%) NSE
A comparison amongst different scenarios based on standard statistics (RAE, RRSE, NSE and R 2 ) showed that scenario 1 to predict the number of WBC, had the highest performance, while to predict the number of BASO, the results of the RF model were not acceptable. Generally, based on the NSE metric, the RF model for predicting NEU, LYMPH, MON, and EOS levels showed good performance (0.65 < NSE ≤ 0.75) and for predicting WBC showed very good performance (0.75 < NSE ≤ 1.00) 33,49 .
The comparison of the actual versus predicted WBC graph in Fig. 3 confirms that, although there is a relatively good agreement between actual and predicted values of WBC, in some cases, the predicted values were not accurate. It often occurs in modelling, which is partly due to the number of data 50 . Also, the application of more precise data 51 can produce better results. Moreover, the type of ML model (e.g., M5 Prime (M5P)) and the use of hybrid algorithms (e.g., random committee (RC)-RF)) may enhance the modelling accuracy 20 . In this study optimization of model parameters was accomplished through trial and error, which the use of metaheuristic optimization algorithms (e.g., genetic algorithm (GA)) 37 can increase the efficiency of the ML model. On the other hand, since obesity is an inflammatory disease which can interfere with the results, hence, the use of variables such as body fat percentage as a more precise characteristic 12 instead of BMI input can improve the results. Finally, it is important to consider that WBCs can also be influenced by different factors, including the menstrual cycle in females (progesterone concentration) 52 , diet, psychological stress, and environmental stress (e.g., temperature and relative humidity) 11 , which in our study were not controlled and their control may increase the accuracy of predicting WBCs using RF model.
Overall, the results of the present study as an initial step confirmed the performance of an ML model to predict the number of WBC during exercise. Furthermore, the proposed RF model in this study can help to reduce the incidence of diseases by identifying the appropriate intensity and duration of exercise.

Conclusion
The determination of the optimal pattern of exercise training (i.e., proper intensity and duration that doesn't suppress the immune system function) is very significant to maintain people's health. Given that, until now, no solution to this problem has been presented, hence this study was designed to develop a new method based on   www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.